Addressing

Boreal

Much more important than memorizing a bunch of rarely used instructions is learning the various ways that memory can be addressed. Registers can only hold a few words of data. To handle more data, random access memory (RAM) must be used.

The DECOUT example stored data as part of an instruction (see DECOUT.ASM). This is called immediate-mode addressing:

                mov     ax, 12345

Immediate mode is useful when the data doesn't change, or is constant, but we often want to deal with values that can change. These values are called variables, and here's how they're coded:

        number  dw      6502            ;Define a Word of data
                . . .
                mov     ax, [number]    ;get what's stored in "number"
                call    decout          ;display its value

"Number" is the label for the memory location that contains the value 6502. Two bytes are needed to hold a value this large. These two bytes form a "word". The PC uses the convention of storing the low byte first. For example, if we convert 6502 to hex, we get 1966h. The 66h will be stored in the first (lower address) byte, and the 19h will be in the second byte.

This is called the "little endian" convention. It's what Intel uses. Motorola, on the other hand, uses the "big endian" convention, which stores the high byte first. This colorful term is from "Gulliver's Travels". The Big-Endians were a group of people who opposed the Emperor's decree that eggs should be broken at the smaller end before they were eaten.

The brackets around [number] are optional in MASM; however this leads to confusion, and other assemblers, such as NASM, require them. The problem is that without brackets there is no distinction shown as to whether the contents of a variable location are being fetched or an immediate-mode constant is being fetched. This is because "number" can be defined two ways: "number dw 6502" or "number equ 6502".

Another quirk of MASM is its use of colons with labels. It insists on having one on a label for an instruction, and it can't handle it if you put one on a label for data (such as "number: dw 6502"). Other assemblers, such as NASM and TASM, aren't this finicky.

If you're having trouble getting your code to assemble correctly, don't be too hard on yourself. Experiment with alternative ways of coding things. MASM is a complex and bizarre assembler, and it doesn't always do what's logical.

Sometimes we need to deal with not just one, but many numbers in a table or an array. Pointers and indexes can be used to access these numbers. Here's how the BP register is used as a pointer to access the data in "table":

        table   dw      1, 10, 100, 1000, 10000 ;define words of data
        tblEnd  equ     $               ;"$" = current memory address

                mov     bp, offset table ;point bp at table
        ex10:   mov     ax, [bp]        ;get word pointed to by bp
                call    decout          ;display it
                add     bp, 2           ;point to next word
                cmp     bp, tblEnd      ;is pointer at end of table?
                jne     ex10            ;loop back if not

We can accomplish the same thing as above by using BP as an index, instead of a pointer:

                mov     bp, 0           ;initialize index to start of table
        ex20:   mov     ax, [bp+table]  ;get word indexed by bp
                call    decout          ;display it
                add     bp, 2           ;index to next word
                cmp     bp, 10          ;is index at end of table?
                jne     ex20            ;loop back if not

We can even have two index registers. (Actually one is called a "base" register. In fact if you want to get carried away, you can think of the data segment register as a third index.) Here's how two index registers might be used:

                mov     si, 2*4                 ;select 3rd name from tbl
                call    showName
                . . .

                tbl     db      'Adok'
                        db      'Bonz'
                        db      'claw'

        ;Display the name indexed by si
        showName:
                mov     bp, 0                   ;initialize index
        sn20:   mov     al, [bp+si+tbl]         ;fetch character
                int     29h                     ;display it
                inc     bp                      ;next character
                cmp     bp, 4                   ;loop for 4 characters
                jne     sn20
                ret

Memory can be addressed with almost any combination of:

        seg_reg:[index+base+offset]             (offset = displacement)

These addressing modes are powerful, but there are many restrictions on which registers can be used. Most of these restrictions go away in 32-bit mode, but in 16-bit mode only four registers can be used for indexing, and only certain pairs of them can be used for double indexing. The legal addressing modes are listed on the second page of PCASM.TXT.

Although they're not usually shown, remember that the segment registers are always involved in accessing memory. When data memory is accessed, the data segment (DS) register is normally, but not always, used. As we saw in the 2PLUS3 example, the DS register can be overridden by specifying another segment register:

                MOV     ES:[6], AL

A small warning: When the BP register is used to access data, the default segment register is not DS, but instead it's the stack segment (SS) register. For the COM files used in this article, this makes little difference because DOS sets SS = DS when it starts the program. However if you use an EXE file, be aware that SS is not normally the same as DS.

Boreal (aka: Loren Blaney)